Clustering Heterogeneous Data Using Clustering by Compression
نویسندگان
چکیده
Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. The application of clustering on the World Wide Web is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Key-Words: clustering heterogeneous data, clustering by compression, FScore
منابع مشابه
A ew Method for Clustering Heterogeneous Data: Clustering by Compression
Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. For example, clustering web pages is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by compression proce...
متن کاملA Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کاملAn Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem
Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...
متن کاملAn Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem
Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...
متن کاملA “ Blind ” Approach to Clustering Through Data Compression Bruno Carpentieri
Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniqu...
متن کامل